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1. Abstract 

The primary task of the vision sensor In a telerobotic system is to provide Information about the 
position of the system's effector relative to objects of Interest In Its environment. The subtasks 
required to perform the primary task Include image segmentation, object recognition, and object 
location and orientation In some coordinate system. The accomplishment of the vision task requires 
the appropriate processing tools and the system methodology to effectively apply the tools to the 
subtasks. This, paper dgjs^ih»v-_th«» functional structure of the telerobotic vision system used In the 
Langley Research Center's .|LaR(H“ Intelligent Systems Research Laboratory (I-SRL-) an4-di*cu*s« two 
monovision techniques for accomplishing the vision subtasks. } 

2 . Introduction 


The telerobotic vision research objective is to adapt, develop, and evaluate noncontact sensing techniques 
to recognize and determine the location of objects In 3-space. To meet the objective, five goals have been 
established: (1) the techniques should be minimally complex in both hardware and software; (2) be generally 

applicable to a wide range of tasks; (3) require minimal or no alteration or premarking of the target objects; 
(4) be capable of mimicking a human operator ( 1 .e. , be able to provide target location Information In terms of 
approach velocity as well as position); and (5) function In human real time (4 Hz.). An assumption that is 
allowed in order to minimize scene complexity is that the target objects are man made and a priori knowledge 
about them Is available to the vision system. This is a reasonable assumption considering the nature of 
current and near future space operations. 

3. System Configuration 

The vision system is a distributed process within the Telerobotic System Simulation (TRSS) [1]. The system 
is functionally configured as two concurrent processes: the vision executive and the vision processor 

(fig. I). The executive Includes the functions of command Interpretation, vision subtask determination, data 
base and modelling activities, local control activity, data conversion, and transfer of vision system status 
information to higher telerobotic system levels. The executive functions are performed by two modules referred 
to as the Interpreter and the control interface. The interpreter directs the determination of target informa- 
tion by the vision system and the control interface processes and transmits the result to the telerobot's 

control ler. 

The interpreter's functions of command Interpretation, subtask determination and sequencing, and data base 
organization and manipulation are hierarchical in structure and, therefore, are natural candidates for 
implementation as trees [2]. A tree is a collection of elements called nodes along with relationships among 
the nodes (e.g., parenthood, childhood, sequence, direction, precedence) that place a hierarchical structure on 
the nodes. A node can represent any entity (e.g., parent, child, subtask, shape, command) that does not 

violate the syntax or relational structure of the tree in which It exists (i.e. it must not Impede the 

execution of the function). Trees cun be subdivided Into subtrees: A subtree consisting of shape nodes would 

represent an object, and one made up of command nodes would represent an execution imperative. 

The vision interpreter is Implemented as an abstract data type that allows the creation, deletion, and 
manipulation of trees of arbitrary size and function. The trees exist only at runtime and only when required 
to execute the requested function, thus, minimizing use of memory. As an example, assume that an Imperative is 
received by the interpreter to locate a detected, but unrecognized object. The appropriate task tree is 
generated along with the necessary subtask, command, and object recognition subtrees embedded correctly in the 
task tree. The tree structure itself ensures the correct execution sequence. When the object is recognized, 
the recognition subtree is replaced by the object's description subtree known a priori, the location subtask 
subtree is generated, and the tree driven execution is performed again. 

The control interface converts raw position data derived by the vision processor to a form compatible with 
the telerobot's control protocol. The TRSS data structure that handles dynamic system Input/output is so 
constructed as to allow all position information to be accessed in terms of a common generic structure, 
generally referred to as an NSAP homogeneous matrix [3]. The matrix: 
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1$ composed of an approach vector A describing the direction normal to the target plane, a sliding vector S 
denoting a direction normal to the A vector within the target plant and describing the rotation of the target 
plane about the A vector, the N vector which Is the cross product of the S and A vectors, and the position 
vector P denoting the x, y, and z translations separating the axis systems of the camera and the target. The 
NSAP matrix contains all the Information necessary to denote the orientation and position of the target with 
respect to the camera frame, and facilitates the various frame transformations that must occur In the tele- 
robotic control process [4]-[5]. The angular parameters required for control can be extracted directly from 
the matrix. The decoupled angles used for finely resolved rate situations can be determined with the help of 
direction cosines as shown below: 

rot. abt. z * arctan(Ny/Sy) 

rot. abt. y * arccos(Az/(l - Ay**2)**0.5) (1) 

rot. abt. x * arccos(Az/(l - Ax**2)**0.5) 

where checks for singularities and proper quadrants are Implied. For position control situations or general 
system requirements, an NSAP to Euler transform has been Implemented. 

The vision processor performs the vision subtask as required by the executive and determines and advises 
the executive of the current status of vision processing. The vision processor is functionally segmented into 
low level, middle level, and high level processing. Low level processes Include thresholding, gray level 
histogram generation and manipulation, and edge detection. Hardware and software Implementing low level 

processes have generally been acquired from outside sources. Middle level processes Include gray level based 
recognition, simple shape recognition, and target location. High level processes Involve complex object 

recognition. Development and Implementation of high level and middle level vision processes are the subjects 
of Internal research. Two middle level processes that have been developed are discussed In this paper. 

4. Monovision Methods 

Two techniques that have application to the vision subtasks of segmentation, shape decomposition, recog- 
nition, and 3-space location are briefly discussed. The techniques are designed to extract 3-space Information 
from a single two dimensional Intensity Image using prior knowledge and the principles of the perspective 
transformation. 

The first method Is based on the elastic matching [6] approach to pattern recognition and has application 
to shape decomposition, object recognition, and object location. It is an adaption of the linear programming 
technique of goal programming to the nonlinear problem of elastic matching [7]. Conceptually, elastic matching 
can be explained by envisioning a transparent reference Image overlaying a goal image. The reference Image Is 
then warped or distorted to conform to the goal Image by locally matching corresponding regions in the two 
images. The reference Image is a flexible template that Is modelled as a system of equation pairs where each 
equation pair represents a linear combination of patterns that a point In the reference Image can describe In 
moving to a point In the goal Image {fig. 2). The amount of displacement that each pattern contributes to the 
distortion is determined by Identifying the values of the parameters A1 and B1 associated with each of the 
distortion patterns. The parameter values are derived by minimizing the absolute differences between 
corresponding reference and goal image points without violating the pattern constraints. This type of problem 
is easily modelled mathematically using the linear programming technique of goal programing [8]. The 
computational procedure that most efficiently resolves the optimal values of the goal programming model s 
parameters Is the Simplex Algorithm. 

The technique has been used to recognize simple three-dimensional objects of minimum curvature ( 1 -e. , near 
planar) and determine their location In 3-space. A single prototype shape {e.g., a rectangle) can be used to 
Identify any of a primitive set of simple shapes by distorting It to match the Image of an unknown shape. A 
simple shape Is here defined to be a convex geometric figure formed on the surface of a sphere of large radius 
and the primitive set consists of rectangles, triangles, and ellipses. The values of parameters A3 through A5 
and 83 through B5 yield Information that allows recognition of the set members regardless of orientation. Once 
an object Is Identified, either as a simple shape or a combi nation of simple shapes, an exact model of Its 
normal view Is distorted to match the now known Image, and Information regarding its location and orientation 
can be derived from the parameters AO through A3 and 80 through B3. Equations (2) through (7) show the 
geometric significance of the parameters. 


AO * X’ - X 
80 =• Y* - Y 

: translation 

(2) 

A1 - -{1 - gain) 
B1 = -(1 - gain) 

: gain 

(3) 


where gain * X * /X or Y'/Y 
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where a**2 « X**2 


A2 • (X* - X)/Y 

: rotation In x-y plane 

(4) 

B2 » CY' - Y)/X 



A3 « -(1 - ga1n)/Y 

: perspective and 

(5) 

83 • -(1 - ga1n)/X 

triangular shape Information 


A4 - (X* - X)/a**2 

: semicircular shape Information 

(6) 

84 - (Y* - Y)/b**2 



Y**2 and b**2 - Y**2 - X**2 



A5 • -(1 - gain)/Y**2 

: elliptical shape Information 

17) 

85 - -(1 - gain)/X**2 




Equations (8) through (10), which are based on properties of the perspective transformation [9], show the 
parameters' relationship to the range, pitch, and yaw respectively of the target object relative to the 
camera's axis system. 


range - (f*Wo*(2 - Al})/((1 - Al)*Ms) (8) 

where f Is the focal plane distance of the camera/lens system, Mo Is the object width, and Ms Is the camera's 
Image sensor width, 

tan 4 » 2*f*A3/(l - Al) (9) 

where * Is the pitch angle, and 

tan 8 - 2*f *83/(1 - Bl) (10) 


where 8 is the yaw angle. 

Using a slightly different template (fig. 3), the technique has also been used to recognize arc segments 
and to decompose a geometrically complex object Into its constituent shapes. The template Is modelled as a 
system of n general equations of the second degree each of which represents a point on the arc segment of 
Interest. The relative values of the derived parameters A, 8, C, D, E, and F Indicate the conic type of which 
the arc segment Is a part (fig. 3) and their numerical values can be used to obtain the axis orientation, the 
foci, the vertices, the axis intercepts, and the eccentrlcty of the conic. 

One way of determining a demarcation between simple shapes in an object's Image Is to locate boundary 
reversals (fig. 4). This Is Indicated when there Is a rotation of axis between two adjacent arc segments such 
that the axes lie In diagonally opposite quadrants. The vertices of arc segments at the boundary reversals are 
used as end points of lines that subdivide the object's Image Into convex shapes that can be approximated by 
the primitive set. 

By linearizing the problem, the computational efficiency of performing elastic matching Is Increased so 
that it becomes feasible as a real time procedure. Previous methods (e.g., exhaustive enumeration and dynamic 
programming) have required running times that are exponentially related to the number (n) of point pairs 
Involved In the match: 

T(n) • r**n (11) 

where r Is the number of possible global match configurations. For an n variable problem, the worst case 
running time of the Simplex Algorithm Is linearly related to n : 

T(n) « n (12) 

When the flexible template Is transposed to Its dual [7]-[8], each pair of points to be matched requires a 
variable. Thus, the addition of point pairs has little Impact on Uw running time of the elastic matcher 
[7]-[8]. When using the technique for object location, the position update frequency Is 4 Hz., which Is In the 
realm of human real time (1.333 to 4 Hz.). It must be noted that most of the time In the position determi- 
nation/manipulator activation cycle of the current testbed Is consumed by the image processing activity and not 
by the parameter identification and location calculations. A faster Image processor would allow frequencies 
approaching video frame rates (30 Hz.). 

The second method determines the location and orientation of a planar object from any four points on the 
object that describe a reasonably convex quadrangle. Given the Inter-vertex distances of the quadrangle and 
the optical parameters of the camera, the rotational and translational displacements between the object and 
camera can be uniquely determined. 

The distance and orientation of the quadrangle relative to the lens axis frame can be solved In a closed 
form. The object points are defined as perspective projections of the image points along rays originating at 
the lens center, that Is 


T1 » Kf *11 


(13) 
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where the quadrangle <10, II, 12, I3> denotes the projection of the target <T0, Tl, T2, T3> on the Image plane 
(fig. 5). The axis system Is chosen such that the x and y components of the projected Image (lx, Iy) lie on 
the Image plane and Iz equals the focal length of the camera. In their paper on passive ranging, Ming and Yeh 
CIO] prove that there exists a unique vector K which relates the target quadrangle and Its Image quadrangle and 
that It can be described In terms of the projected Image points and the Inter-vertex distances. The distances 
between the pairs of vertices can be described by a unique pair of nonzero real numbers, alpha and beta. 
Independent of the coordinate system chosen, such that 

13 * 10 ♦ alphaMU - 10) ♦ beta*(I2 - 10) (!♦) 


where noncol linearity Implies that 


alpha + beta * 1 


Equations (13) and (14) can be rewritten as 

k3*T3 ■ kO*TO ♦ alpha*(kl*Tl - kO*TO) ♦ beta*(k2*T2 - k0*T0) (16) 

By substituting for the Tl and dividing by k3, equation (16) can be transformed to 

13 * (kOA3)*( 1-al pha-beta )*I0 * (kl/k3)*a1pha*Il ♦ (k2A3)*beta*I2 (17) 

where the I vector represents the (x, y, z) coordinates of the Image points. Noting that k3 Is camon to all 
the right hand terms. It can be considered a scaling factor that reduces the target quadrangle from Its 
original dimensions to Its projected dimensions at the Image plane where k3 equals 1. Thus, from similarity. 
Hung and Yeh describe k3 In terms of the relationship of the magnitudes of the real and projected diagonals: 

k3 - | |T0 - T3 1 1 / 1 1 (kOA3 )*( 1 - alpha - beta)*I0 - I3|| (18) 


This Information Is sufficient to solve for the three dimensional positions of the quadrangle vertices (Tl) In 
the camera axis frame. The quadrangle orientation, described by the equation of the normal to the plane 

occupied by the quadrangle In 3-space, Is determined by Substituting the coordinates of *ny three vertices into 

the general equation of the plane. Solving the system of simultaneous equations gives the following explicit 
expressions for the orientation vector In terms of the quadrangle vertices derived above: 

Ax' ■ (Tly*T2z-Tlz*T2y+T0z*T2y-T0y*T2z+T0y*Tlz-T0z*Tly )/(D(T) ) 

Ay’ - (TlZ*T2x+Ttx*T2z+T0x*T2z-T0i*T2x+TQz*Tlx-T0x*Tlz ) /( D (T ) ) (19) 

Az' - (Tlx*T2y-Tly*T2x+T0y*T2x-T0x*T2y+T0x*Tly-T0y*Tlx)/(D(T)) 

where 

0(T) » T0x*(Tly*T2z-Tlz*T2y )+TOy*(Tlz*T2x-Tlx*T2z)+TOz*(Tlx*T2y-Tly*T2x) (20) 

and Ax, Ay, and A z are determined from Ax', Ay', and Az* by normalizing by the magnitude of the vector 
(Ax’ , Ay* , Az' ). 

Once the positions of the quadrangle vertices and the direction of its normal are known, the vectors that 

comprise the NSAP matrix can be found. The approach vector A Is the orientation vector derived above. The 

slldlnq vector S is related to the slope of the base of the quadrangle with respect to the camera frame, it is 
the x. y, and z components of the vector Tl - TO normalized by Its length The position vector P pimply 
components of the selected point of approach on the quadrangle <T0, Tl, T2, T3>. The Intersection of the 
diagonals 1$ commonly chosen. 

For each probable target. It Is necessary to determine and specify the alpha and beta parameters, based 
uoon the Inter-vertex distances of the quadrangle for each target introduced. One approach to entering new 
models in the data base Is to automate this task In a one shot Initialization procedure by processing one T™"* 
of the target Image from a camera position normal to and at a known distance from the - 

are calculated and stored in the data base. The calculations are based on equation (13) (and its transforma 
tlons) with the K vector known. The results are presented here without derivation. 


beta - V3/V1 


Y1 « I0x*(I2y - Ily) ♦ Ux*(I0y - I2y) ♦ I2x*(Ily - IOy) 

V2 * -{I0x*(I3y - I2y) ♦ I2x*(IQy - I3y) ♦ I3x*(I2y - IOy) {22) 

V3 « I0x*(l3y - Ily) ♦ Ux*(ICy - I3y) ♦ I3x*(Ily - IOy) 

The raw state information consisting of the three translational and the three angular displacements of the 

taroet from the camera generated by both the elastic matcher and quadrangle projection methods Is converted to 
the NSAP matrix. This matrix is input to the Interface control section of the vision executive for further 
processing. 
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5. Future Work 

The vision system development In the ISRL centered on the processing of single, two dimensional, 
based (i.e., video) Images. The next research phase will Involve the extension of the system to process single 
three dimensional range based images as well as further refinement of the two dimensional techniques. The 
successful development of a laser vision sensor based on the FM-CW radar technique will support the next 
phase [11]. 
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Figure L * System configuration. 
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Figure Z - Elastic template. 


Figure 1 - Arc segment identification. 







Figure < - Shape decomposition. 


Figure \ - Quadrangle projection. 
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